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1. (Currently Amended) A method of providing speaker recognition, said 
method comprising the steps of: 

providing a model corresponding to a target speaker, the model 

being resolved hierarchically into ai least one fra^ * i 

phonetic detail of varying resolution; .,' : 

receiving an identity clai m, wherein the identity claim is a test utterance and at 
least further wherein features are extracted from the test utterance : 

ascertaining whether the identity claim corresponds to the target speaker model; 

said ascertaining step comprising the steps of: 

determining, for each frame and each level of phonetic detail of the target 
speaker model, a likelihood value; and 

resolving the at least one likelihood value to obtain a likelihood score; 

wherein the likelihood values are determined utilizing grain-specific weights. 

-2- 

PAGE 7/19 * RCVD AT 6/21/2007 4:52:47 PM [Eastern Daylight Time] ' SVR:U8PTO-EFXRF-3/3^ DNIS:273830fl * CSID:412 741 9292 * DURATION (min-ss):04-18 



06-21-' 07 16:54 FROM- 



412-741-9292 



T-880 P008/019 F-284 



Atty. Docket No. YOR9-2000-0168US1 

(590.014) 



2. (Previously Presented) The method according to Claim 1, wherein, for each 
frame and each level of phonetic detail, the likelihood value is a maximum likelihood RECEIVED 



3. (Original) The method according to Claim 2, wherein said step of resolving 



4, (Original) The method according to Claim 3, wherein the likelihood value is 
determined via the following general equation: 



(-1 y'ol 

and further wherein: 

S is the likelihood score; 

t/ is a test utterance, comprising T frames M| . . . , «t; 

M(iJ) is a speaker model, with 1 ^ j ^ L levels of detail and with l<j< K(i) units 
on the i-th level; and 

P(ut\M(iJ)) is the probability that a frame w/ corresponds to a speaker model unit j 
on the Z"th level of phonetic detail of the speaker model. 
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the at least one likelihood value comprises averaging the at least one likelihood value. 




wherein b„{ij(i,t)} corresponds to grain-specific weights that satisfy 
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5. (Original) The method according to Claim 4, wherein the likelihood score is 
determined by the following equation: 

S(U\M)^^f] max i>(Mj A/ (/,;)) . 

6. (Original) The method according to Claim I, wherein the at least one level of 
phonetic detail comprises at least one of the following: a global level; a phonemic level 

and a sub-phonemic level. . • . 

7. (Original) The method according to Claim 6, wherein the at least one level of 
phonetic detail comprises all of the following three levels: a global level; a phonemic 
level and a sub-phonemic level. 

8. (Original) The method according to Claim 7, wherein said step of providing .! 
model corresponding to a target speaker comprises creating said target speaker model on = ' • 
the basis of training utterances and providing labeling information for each frame. 

9. (Original) The method according to Claim 1, wherein said ascertaining step 
further comprises accepting or rejecting the identity claim. 

10. (Original) The method according to Claim 9. wherein said step of accepting 
or rejecting comprises comparing a quantity based on the likelihood score to a 
predetermined threshold value. 
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1 1 . (Original) The method according to Claim 10, further comprising the steps 

of: 

providing at least one model corresponding to at least one background speaker; 

and 

determining the quantity based on the likelihood score via employing the at least 
one background speaker model. . 

12. (Original) The method according to Claim 11, wherein said step of ' ' : 
determining the quantity based on the likelihood comprises determining a logJikelihood 

' ratio based on the likelihood score. • : 

13. (Previously Presented) The method according to Qaim 12, wherein the log- 
likelihbod ratio is determined by the following equation: ^ . . . ■ i - : 

L = S{V I M) -— J S{U \BG,) \ 

wherein: 

L is the log-likelihood ratio; 
S is the likelihood score; 

C/ is a test utterance, comprising T frames mj . . . , wt; 
M denotes the target speaker model; and 
BGi denotes the i-th background model, 

PAGE 1 0/1 9 * RCVD AT 6/21/2007 4:52:47 PM [Eastern Daylight Time] * ^ 



06-21-' 07 16:54 FROM- 412-741-9292 T-880 P011/019 F-284 

Atty, Docket No. YOR9-2000-0168US1 

(590.014) 

14. (Currently Amended) An apparatus for of providing speaker recognition, 
said apparatus comprising: 

a target speaker model generator for generating a model corresponding to a target 
speaker, the model being resolved hierarchically into at least one frame comprising a 
plurality of levels of phonetic detail of varying resolution; 

a receiving arrangement for receiving an identity clai m, wherein the identity claim * 
. is a test utterance, and further wherein features are extracted from the test utterance : . ^ ^ 

. a decision arrangement for ascertaining whether the identity claim corresponds to 

the target speaker model; 

i • said decision arrangement being adapted to: 

determine, for each frame and each level of phonetic detail of the target 
speaker model, a likelihood value; and 

resolve the at least one likelihood value to obtain a likelihood score; 
wherein the likelihood values are determined utilizing grain-specific weights. 

15, (Previously Presented) The apparatus according to Claim 14, wherein, for 
each frame and each level of phonetic detail, the likelihood value is a maximum 
likelihood value. 
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16* (Original) The apparatus according to Claim 15, wheiein said decision 
arrangement is adapted to resolve the at least one likehhood value via averaging the at 
least one likelihood value. 

17. (Original) The apparatus according to Claim 16, wherein the likelihood 
value is determined via the following general equation: 

wherein b_{ij(i,t)} corresponds to grain-specific weights that satisfy 

and further wherein: 

S is the likehhood score; 

C/ is a test utterance, comprising T frames U],,,,uj; 

M(IJ) is a speaker model, with l<i<L levels of detail and with l<^j< K(i) units 
on the i-th level; and 

P(Ut\M(iJ)) is the probability that a frame corresponds to a speaker model unit j 
on the i'th level of phonetic detail of the speaker model, 

18. (Original) The apparatus according to Claim 17, wherein the likelihood score 
is determined by the following equation: 
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5(C/lM)=lt 



max P(u^\M(iJ)) . 



19. (Original) The apparatus according to Qaim 14, wherein the at least one 
level of phonetic detail comprises at least one of the following: a global level; a 
phonemic level and a sub-phonemic level. 

20. (Original) The apparatus according to Qaim 19, wherein the at least one 
level of phonetic detail comprises all of the following three levels: a global level; a 
phonemic level and a sub-phonemic level. 

21. (Original) The apparatus according to Claim 20, wherein said target speaker 
model generator is adapted to generate said target speaker model on the basis of training 
utterances and providing labeling infomiation for each frame. 

22. (Original) The apparatus according to Qaim 14, wherein said decision 
arrangement is further adapted to accept or reject the identity claim. 

23. (Original) The apparatus according to Claim 22, wherein said decision 
arrangement is adapted to accept or reject the identity claim via comparing a quantity 
based on the likelihood score to a predetermined threshold value. 

24. (Original) The apparatus according to Claim 23, further comprising; 

a background speaker model generator for providing at least one model 
corresponding to at least one background speaker, 
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said decision arrangement being adapted to determine the quantity based on the 
likelihood score via employing the at least one background speaker model. 

25. (Original) The apparatus according to Claim 24, wherein said decision 
airangenient is adapted to determine the quantity based on the likelihood via determining 
a log-likelihood ratio based on the likelihood score. 

26. (Previously Presented) The apparatus according to Claim 25, wherein the 
log-likelihood ratio is determined by the following equation: 

L = S(U\M)~'£S(UiBG,); 

• . . / ' • ■ . C f =1 . 

wherein: 

L is the log-likelihood ratio; 
5 is the likeUhood score; 

1/ is a test utterance, comprising T frames . . , , wri 
M denotes the target speaker model; and 
BGi denotes the i-Xh background model, 

27. (Currently Amended) A program storage device readable by machine, 
tangibly embodying a program of instructions executable by the machine to perform 
method steps for providing speaker recognition, said method comprising the Steps of: 

-9, 
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providing a model corresponding to a target speaker, the model being resolved 
hierarchically into at least one frame comprising a plurality of levels of phonetic detail of 
varying resolution; 

receiving an identity claim , wherein the identity claim is a test utterance, and 
further wherein features are extracted from the test utterance : 

ascertaining whether the identity claim corresponds to the target speaker model; 

said ascertaining step comprising the steps of: 

determining, for each frame and each level of phonetic detail of the target * 
speaker model, a likelihood value; and 

resolving the at leiasi one likelihood value to obtain a likelihood score; 
wherein the likelihood values are determined utilizing grain-specific weights. 



- 10- 



PAGE 15/19 * RCVD AT 6/21/2007 4:52:47 PM [Eastern Daylight Time^ 



